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Abstract 

Background: Predicting the geographic distribution of widespread species through modeling is problematic for 
several reasons including high rates of omission errors. One potential source of error for modeling widespread 
species is that subspecies and/or races of species are frequently pooled for analyses, which may mask biologically 
relevant spatial variation within the distribution of a single widespread species. We contrast a presence-only 
maximum entropy model for the widely distributed oldfield mouse {Peromyscus polionotus) that includes all 
available presence locations for this species, with two composite maximum entropy models. The composite 
models either subdivided the total species distribution into four geographic quadrants or by fifteen subspecies to 
capture spatially relevant variation in P. polionotus distributions. 

Results: Despite high Area Under the ROC Curve (AUC) values for all models, the composite species distribution 
model of P. polionotus generated from individual subspecies models represented the known distribution of the 
species much better than did the models produced by partitioning data into geographic quadrants or modeling 
the whole species as a single unit. 

Conclusions: Because the AUC values failed to describe the differences in the predictability of the three modeling 
strategies, we suggest using omission curves in addition to AUC values to assess model performance. Dividing the 
data of a widespread species into biologically relevant partitions greatly increased the performance of our 
distribution model; therefore, this approach may prove to be quite practical and informative for a wide range of 
modeling applications. 



Background 

Species distribution modeling (SDM) has become a 
common tool for understanding spatial distribution pat- 
terns of biodiversity worldwide [1-4]. The goal of SDM 
is to build a model predicting the relative probability of 
occurrence of a species across geographic space com- 
monly using environmental data (i.e. climate, vegetation, 
soil, etc.) and a dataset of known presence or presence/ 
absence localities. The terms ecological niche model, 
environmental niche model, and species distribution 
model have all been used to describe this type of model- 
ing in the literature; for the sake of simplicity we will 
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use species distribution modeling. SDM techniques con- 
tinue to evolve with an increasingly broad range of 
applications from conservation planning [5,6], to pre- 
dicting species colonization and abundance [4,7,8], pre- 
dicting disease outbreaks [1], and understanding 
phylogeographic patterns [9]. Methods of producing 
SDMs vary with the type of data available, purpose, and 
software used. 

There is a direct link between climate and the distribu- 
tion of plant species [10]. Because climate is a causal fac- 
tor in the distribution of plant species (and plant species 
assemblages), climatic patterns at various spatial scales 
directly affect habitat types and community productivity. 
Thus, climate is considered a proxy for a given species' 
environmental niche. Because of the variables involved in 
building SDMs, it is important to keep in mind that 
SDMs are predicting a species' fundamental niche as 
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probability of occurrence [11,12], not the realized distri- 
bution, which is affected by many extrinsic factors that 
may not be accounted for in the model The fundamental 
niches of species are considered to be conserved over 
evolutionary time [11], which has allowed climate-based 
SDMs to be successful in predicting the occurrence of 
species or closely related species at previously unsampled 
localities [13-16]. 

Modeling species whose distributions span large envir- 
onmental or habitat variation may be problematic 
because distribution models tend to have higher rates of 
omission error (i.e. underprediction) in the predicted 
species distributions [2,17,18]. In such cases, the models 
may indicate regional specialization of periphery or iso- 
lated populations. Commission errors (i.e. the overpre- 
diction of distributions) may result from a restriction of 
the realized distribution due to biotic interactions or 
geographic barriers to range expansion. In an attempt to 
overcome omission errors in SDMs for widespread spe- 
cies, Osborne & Suarez-Seoane [17] modeled species 
distributions by spatially partitioning their data into geo- 
graphic quadrants and into concentric rings to model 



each data partition separately. Hernandez et al. [18] sug- 
gested that future research should focus on modeling 
broad distributions in subunits that are based on distinct 
genetic lineages or recognized subspecies. 

Most species in the genus Peromyscus are widespread, 
with a positive correlation between species range and 
number of recognized subspecies [19], suggesting that 
local specialization is common within species of this 
genus. With 15 recognized subspecies [20] and geneti- 
cally structured populations, P. polionotus is an excellent 
model species for developing new methods of data parti- 
tioning to overcome the problems associated with mod- 
eling the geographic distributions of widely distributed 
species. 

The oldfield mouse {Peromyscus polionotus), also 
known regionally as the Florida beach mouse, is wide- 
spread throughout the southeastern United States (Fig- 
ure 1). Morphological and genetic differences have been 
documented between subspecies [21-25]. Molecular evi- 
dence suggests that little or no gene flow occurs 
between the highly structured populations on islands 
along Florida's panhandle [25] or between P. p. rhoadsU 




P. p. leucocephalus 
P. p. griseobracatus' 



Figure 1 P. polionotus in southeastern United States. Distribution map of the 15 supspecies of P. polionotus redrawn from Hall (1981). 
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on Florida's central ridge, and P. p. niveiventris, on the 
Atlantic coast [21]. Highly differentiated populations of 
Peromyscus are suggested to be the result of strong local 
adaptation [19,23,26]. Coastal dune populations along 
the Gulf of Mexico are phenotypically more similar to 
populations along the Atlantic Coast (especially in coat 
color) than to neighboring populations, suggesting they 
are under similar selective pressures in their disjunct 
coastal environments [23]. Throughout its range, P. 
polionotus is threatened by development and invasive 
species exerting competition and predation pressures on 
its populations, and the subspecies P. p. decoloratus has 
been considered extinct since 1950 [19,26]. 

Given the evidence for local adaptation in this wide- 
spread species, previous research would predict that an 
SDM of the entire species would underpredict the geo- 
graphic distribution of P. polionotus. A more accurate 
prediction of the species' distribution may result from a 
biologically informed spatial partitioning of locality data. 
To test this hypothesis, we modeled the distribution of 
P. polionotus in three ways; we modeled the whole spe- 
cies distribution at once, we partitioned locality data 
into four geographic quadrants following Osborne & 
Suarez-Seoane [17], and we partitioned locality data by 
the 15 recognized subspecies of P. polionotus. 

Results 

All individual and composite models produced AUC 
values above 0.84, which are consistent with AUC values 
reported in the literature for other taxa [9,17,18,27]. 
Despite having a high AUC value (0.899), the model 
based on the entire data set failed to predict the occur- 
rence of P. polionnotus in places where it is clearly 
known to occur (Figure 2a), most obviously omitting the 
distributions of the subspecies P. p. colemani and P. p. 
polionotus. Partitioning the data by geographic quad- 
rants (Figure 2b) and by subspecies (Figure 2c) pro- 
duced models that are progressively better, both in 
terms of predicting the known distribution, and in 
terms of their AUC scores. The average AUC value of 
the individual models used to build the geographic 
quadrant composite model was 0.927, whereas the aver- 
age AUC value of the individual models that were used 
to build the subspecies composite model was 0.976. 

Partitioning the data into geographic quadrants pro- 
duced four models with AUC scores of 0.844, 0.968, 
0.905, and 0.993 (clockwise from northeast; Figure 3). 
The quadrant composite model (Figure 2b) predicted 
high probabilities of occurrence in areas that the full 
species model (Figure 2a) had omitted. However, the 
quadrant composite model (Figure 2b) showed poor 
resolution in parts of northern Georgia and peninsular 
Florida. When the dataset was partitioned according to 
the currently recognized subspecies (Figures 4 and 5), 



Speices Method 




Figure 2 Maxent distributions for P. polionotus using different 

data partitions. Predicted species distributions using presence data 

for the entire species at once (A), by dividing the presence data 

into quadrants (B), and by subspecies (C). 
I J 



each subspecies model performed well based on AUC 
scores. Twelve out of 15 models had AUC scores 
between 0.97 and 1.0; the exceptions being P. p. cole- 
mani (0.917), P. p. polionotus (0.851), and P. p. trissyl- 
lepsis (0.5). The poor performance of the model for P. p. 
trissyllepsis was due to insufficient data (n = 2) for the 
population and was omitted from the composite model. 
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Although superficially all models developed from data 
partitions seem to perform well based on AUC values, 
rates of omission between these methods show a different 
perspective. Figure 6 shows three omission curves for the 
whole-species model (Figure 6a), the quadrant method 
(Figure 6b) and the subspecies model (Figure 6c). The 
curves show omission error (Y-axis) as a function of pre- 
dicted probability of occurrence (X-axis). Better performing 
models based on the logistic output of Maxent have fewer 
omission errors even as predicted probability of occurrence 
reaches maximal values. The whole-species model (Figure 
6a) and the quadrant model (Figure 6b) have omission 
error rates that increase linearly with increasing probability 
of occurrence. In contrast, the subspecies model (Figure 
6c) has relatively low omission error rates that only begins 
to increase when predicted probability of occurrence 
reaches higher values, which is preferable. 

Discussion 

We suggest that partitioning data in a biologically mean- 
ingful way (as opposed to geographically) can help to 



overcome high omission rates in distribution models of 
widespread species [17,18,28]. Despite having a relatively 
high AUC value (0.899), the model built upon the whole 
species' distribution failed to predict known localities of P. 
polionotus, which can be observed by comparing Figure 1 
to Figure 2a. Spatially partitioning the data into quadrants 
produced a much better distribution model after combin- 
ing four regional models with AUC values ranging from 
0.844 to 0.993, which can be seen by comparing Figure 1 
to Figure 2b. The 14 AUC scores for the SDMs based on 
subspecies partitioning ranged from 0.851 to 1.0, which is 
not substantially different from AUC values obtained for 
the quadrant models. However, the accuracy and increased 
resolution of the composite of the subspecies models can 
be seen in comparing the three panels of Figure 2 and by 
comparing Figure 1 to Figure 2c. 

Limitation of AUC values for assessing predictive 
performance 

AUC values are commonly used as indicators of model fit 
[9,17,18,27], and high values for all three methods in our 
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Figure 4 Maxent distribution of P. polionotus on peninsular Florida. Predicted species distribution of P. polionotus estimated from presence 
data of each mainland and peninsular subspecies (A-G) analyzed separately. 



study would suggest that each method produced highly 
accurate models. Furthermore, the modest increase in 
AUC values with ever greater data partitioning would sug- 
gest that each successive partitioning scheme produced, at 
best, only slightly better fitting models. However, this find- 
ing is misleading when one compares the predicted distri- 
butions to the known distribution for the species. The 
trend of increasing AUC scores may indicate the direction 
of change in accuracy, but it fails to capture the magnitude 
of improvement in the predicted distributions of the quad- 
rant composite method and the subspecies composite 
method. This failure is, in part, due to the fact that the 
AUC scores for the geographic quadrant method and sub- 
species method are composites of 4 and 14 combined 
models (respectively), and the accuracy (or inaccuracy) of 
individual models is compounded when they are com- 
bined. This fact alone, however, cannot fully explain the 
observed discrepancy between the vastly improved model 
prediction and the modestly better AUC scores. 

AUC values can be misleading when assessing a mod- 
el's predictive ability for several reasons. The AUC 



measures discrimination and not accuracy per se, thus 
ignoring the goodness of fit of a model [29]. The AUC 
value also takes into account the performance of the 
model at the extreme left (as well as the right) of the 
ROC curve (see [29] for a details), a region that is not 
operationally meaningful in our case. We are only inter- 
ested in thresholds of predicted probability of occur- 
rence greater than 0.50 because that would equal the 
probability of occurrence of a null model. This inclusion 
of the area under the extreme left of the ROC curve can 
inflate AUC values, which can be further inflated when 
the total geographic extent of the model is considered. 
If the ratio between areas of presence and the total 
extent is high, true positives are more likely to occur by 
chance alone [28]. Because this ratio changes with each 
of the individual models built on different data parti- 
tions, AUC values may not be useful in accurately com- 
paring relative model performance between or among 
our subspecies and regional models. 

It is also possible that the inflation of AUC values 
observed in the models we present results from the 
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P. p. sumneri 



Figure 5 Maxent distribution of P. polionotus on Florida panhandle. Predicted species distribution of P. polionotus estimated from presence 
data of each Florida panhandle subspecies (A-G) analyzed separately. 



interaction between geographic and environmental 
space. Because of the narrow geographic space at 
which most subspecies of P. polionotus occur, locality 
information is geographically clustrered; therefore, the 
environmental space sampled by the models may show 
spatial autocorrelation in some of the environmental 
variables used. In SDM, spatial autocorrelation occurs 
when the values of the variables sampled at nearby 
locations are not independent from each other [30] 
and as a result, measures of accuracy (e.g. AUC) can 
be inflated [31,32]. In our case, the geographic cluster- 
ing of narrowly distributed subspecies of P. polionotus 
may cause spatial autocorrelation and thus inflate AUC 
values (see Figure 4 and 5). Nevertheless, within these 
narrow extents, we included samples spanning the 
entire geographic space representing a significant por- 
tion of the environmental space occupied by each sub- 
species. The resulting models are accurate to the true 
distribution of the subspecies and are able to detect 
even subtle local environmental conditions likely 
affecting each subspecies differently, despite the see- 
mingly geographic clustering. This further emphasizes 
the point that AUC values provide an unreliable way 



to accurately compare relative model performance. The 
only exceptions to the issue of inflated AUC values in 
our dataset are P. p. colemani (AUC = 0.917) and P. p. 
polionotus (AUC = 0.851), which are the only two 
widely distributed subspecies spanning a more hetero- 
geneous environmental space where locality informa- 
tion for the subspecies is not geographically clustered 
(Figure 2). Because of the larger geographic space 
occupied by these subspecies, the resulting models 
from P. p. colemani and P. p. polionotus are unlikely 
affected by spatial autocorrelation and therefore do not 
show inflated AUC values. 

Finally, the AUC does not provide information as to 
the spatial distribution of errors. It also weighs omission 
and commission errors equally, both of which vary in 
interpretive meaning and importance with the intended 
use of the model [29]. Because we do not have true 
absence data, we cannot quantify our commission error 
rate. However, the omission curve shows how well the 
model performs at different thresholds (i.e. the distribu- 
tion of omission errors). Therefore, the omission curve 
can be as important as the AUC value in terms of asses- 
sing model performance, if not more-so. A model with 
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Cummulative Threshold 

Figure 6 Omission curves as a means of assessing model 
performance. Omission curve (thin green line) for the species level 
model of P. polionotus (A), the northwest region of the quadrant 
method (B), and the single subspecies P. p. phosma of the 
subspecies method (C). Mean AUC value for replicate runs was 
0.899 (std.dev. = 0.105), 0.905 (std.dev. = 0.119), and 1.0 (std.dev. = 
0.001), respectively. 



relatively lower omission errors at higher predicted 
probabilities of occurrence is preferred. 

Quadrant versus subspecies partitioning 

Because Maxent draws pseudo-absence data at random 
to calculate AUC scores, it is possible that in our quad- 
rant analysis it drew false pseudo-absences from areas 
outside the quadrant being tested, especially in the case 
of the northwest quadrant. Whether this occurred, and 
if it did, whether it contributed to the observed under- 
prediction is debatable. First, there was far less under- 
prediction in the other three quadrants. For example, 
the model for the southwest quadrant, which includes 
peninsular Florida predicted occurrences in regions 
where the species does not occur. Second, the two 
northern quadrants cover roughly the same geographic 
area as two of the subspecies (P. p. colemani and P. p. 
polionotus), yet the two northern quadrants failed to 
predict areas of known occurrence that the two subspe- 
cies models predicted accurately. It is possible that poor 



quadrant-based models resulted because pseudo- 
absences were being generated by Maxent in areas with 
true presences for either of the two subspecies just out- 
side of the quadrant being modeled. That is, the arbi- 
trary boundary between the quadrants obfuscates 
biologically meaningful boundaries between populations 
or subspecies. Thus, in using quadrant-based partition- 
ing, niche information was lost for the two subspecies, 
which emphasizes our point that biologically relevant 
data partitioning informs species distribution models. 

Molecular data and population distribution models 

It is easy to see how molecular data, such as DNA 
sequences, can be used to delineate biologically mean- 
ingful groups (i.e. clades) within a species, and that 
those clades might be partitioned separately for species 
distribution modeling, especially if they are geographi- 
cally discrete. But, just as molecular data can improve 
methods of generating SDMs, the findings associated 
with SDMs can also inform the work done by molecular 
biologists studying population genetics, phylogenetics, or 
phylogeography. When SDMs are nonoverlapping for 
populations within a species, they may be revealing 
cryptic patterns of divergence that would be interesting 
to study with molecular data. Conversely, when molecu- 
lar data uncover population structure or limits to gene 
flow, SDMs can be used to test hypothesized mechan- 
isms of divergence such as niche differentiation. Exam- 
ining both molecular data and SDMs together has been 
explored only recently [6,9]. 

Building SDMs for P. polionotus by partitioning data 
into subspecies and building a composite distribution 
model mitigated the problem of high omission rates that 
usually occurs when modeling the distributions of 
widely distributed species. This suggests that the SDM 
based on biologically relevant partitions (subspecies in 
our case) could accommodate variability in the niches of 
subspecies, whereas modeling the whole species distri- 
bution together could not. This is supported by the fact 
that spatial partitioning of data into quadrants produced 
models that had regions of both under- and over-predic- 
tion (Figure 3), whereas the models based on partition- 
ing by subspecies showed no signs of underprediction 
and only modest overlap in the distributions of adjacent 
individual subspecies caused by overprediction (Figure 
5). The evidence of high levels of population structure 
between locally adapted populations [19,21-23,25,26] 
might be driving the improvement we see in the compo- 
site model based on subspecies distributions. In our 
case, we show that molecular data at the population 
level improved model accuracy. Furthermore, in the 
absence of detailed molecular information on the popu- 
lations studied, researchers could generate relevant data 
partitions using alternative data sources such as 
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subspecies delimitations, morophological differences or 
other phenotypic traits. 

Phylogeographic implications for P. polionotus 

Guisan & Zimmermann [28] encourage collaboration 
with evolutionary biologists and population geneticists 
in cases where widespread species are being modeled. 
More recently, Rodder et al. [9] discussed how a variety 
of techniques including molecular ecology and environ- 
mental niche modeling can be complimentary in 
answering phylogeographic questions. The case is such 
here, where our method of partitioning data was based 
largely on the literature, which includes population 
genetic studies that have been conducted on P. poliono- 
tus. Conversely, as molecular work helped to inform our 
models, our models also shed light on and confirm 
results of studies on the species' genetic structure, and 
possibly evolutionary trajectory. For example, the SDMs 
for P. p. rhoadsi and P. p. nivieventris do not overlap 
and are geographically discrete (Figure 4 and 2c), which 
is consistent with the genetic results of Degner et al. 
[21] and current taxonomy (Figure 1). The southeastern 
quadrant model, however, has very poor resolution of 
this finescale distinction (Figure 3c). Similarly, the mod- 
els for P. p. polionotus and P. p. colemani capture the 
known extent of their respective ranges (Figure 5), while 
the two northern quadrant models do not (Figure 3, a 
and 3b), implying that these two subspecies occupy dif- 
ferent climatic niches. 

Climate appears to play an important role in defining 
inland and inland vs. coastal subspecies (e.g. Figure 4A- 
D). However, despite there being genetic differences in 
the beach mouse subspecies located in the Florida pan- 
handle (Figure 5A-F), there was considerable overlap in 
their predicted distributions, suggesting that climate 
may not be the primary factor defining the range of 
these subspecies. Predicted niche overlap usually 
occurred between adjacent coastal subspecies (Figure 
5A-F) and only once between coastal and inland subspe- 
cies (see P. p. albifrons and P. p. sumneri, Figure 5F-G). 
Studies have shown that the coastal beach mouse popu- 
lations reflect patterns of local adaptation and strong 
selection favoring cryptic coloration [22-25]. Therefore, 
in cases where climatic habitat on adjacent coastal bea- 
ches might be similar, vicariance (i.e. coastal inlets) and 
strong selection for coat coloration are more likely than 
climate to maintain the distinctiveness of coastal beach 
mouse subspecies. 

Conclusions 

Using a biologically meaningful method of partitioning 
the data from widely distributed species generated a 
composite SDM of P. polionotus that more accurately 
reflected the known distribution of the species than the 



process of analyzing the whole species at once or parti- 
tioning the data into geographic quadrants. Osborne 
and Suarez-Seoane [17] note that geographic based data 
partitioning (e.g. quadrants) may not have worked well 
due to the absence of any biological basis for partition- 
ing. We contend that our study confirms that statement. 
We also provide an example of how SDMs can be both 
informed by as well as inform phylogeographic studies 
at the population and species levels. Modeling a wide- 
spread species using biologically meaningful data parti- 
tions has the potential to greatly increase the 
performance of distribution models while only requiring 
basic manipulation in GIS software. Thus, this technique 
may prove to be quite practical for a wide range of 
modeling applications. Despite the increasing use and 
popularity of ENMs, a completely objective, accurate, 
and fully accepted measure of performance of predictive 
distribution models is still elusive [9,27,29]. We suggest 
using both the AUC and omission curve on a contextual 
basis to assess model performance. 

Methods 

We created species distribution models for P. polionotus in 
Maxent using the WorldClim climate layers. Maxent uses 
the principle of maximum entropy density estimation to 
generate a probability distribution based on presence-only 
data [33,34]. It has been shown to produce more accurate 
models with lower sample sizes than other distribution 
modeling software [18,35]. We used the WorldClim Cur- 
rent BioClim climate layers at 30 arc-seconds resolution 
(about 1 km 2 ). These layers are based on data from 1950- 
2000 and comprise 19 bioclimatic variables representing 
annual trends, seasonality, and extremes of precipitation 
and temperature [36]. We used the entire set of 19 cli- 
matic variables because we did not make any a priori 
assumptions of correlation among these variables. We 
clipped the WorldClim layers in ESRI ArcGIS 9.3 to 
include the extent of the species geographic range in our 
models (N35.00, E-77.0, W-92.0, S25.00). 

Presence data was obtained from collection localities 
of museum specimens of P. polionotus identified to sub- 
species (Louisiana State Museum of Natural Science, 
Michigan State University Museum, National Museum 
of Vertebrate Zoology, American Museum of Natural 
History, University of Michigan Museum of Zoology, 
University of Kansas Biodiversity Institute, Sam Noble 
Oklahoma Museum of Natural History), found on the 
online data bases Mammal Network Information System 
[37] and Global Biodiversity Information Facility [38]. 
Records lacking GPS coordinates, but with specific writ- 
ten locality information were georeferenced following 
MaNIS protocols using Google Earth and the U.S. 
Board of Geographic Names' (BGN) Geographic Names 
Information System (GNIS) [39]. 
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To obtain a model of distribution for the entire spe- 
cies, a model representing the mean distribution was 
produced in Maxent using a cross-validation approach 
of all specimen localities. The cross-validation function 
split the data set into n samples. In each of the n repli- 
cates, a single specimen was tested sequentially against 
all remaining samples (i.e. n - 1), which formed the 
training set of localities [40]. This eliminated the need 
to partition a dataset into large training and testing sets. 
This approach is useful when dealing with especially 
small datasets, where splitting the data would result in a 
training set of insufficient size. 

Similar to Osborne and Suarez-Seoane [17], we spa- 
tially partitioned our data into geographic quadrants 
(northeast, southeast, northwest, southwest) based on 
the unweighted centroid of our dataset. Using the same 
methods as described above, we ran models for each of 
the four data partitions. We note that we only parti- 
tioned our presence data. Therefore, in this case, the 
pseudo-absences drawn by Maxent are drawn from our 
complete working extent (not merely the quadrant 
being examined in isolation). These four models were 
then combined to produce a composite model of prob- 
ability of occurrence for the entire species. This was 
done in ESRI ArcGIS using the Spatial Analyst toolbox 
to create a new raster based on the four independently 
modeled quadrants. When two or more quadrants pre- 
dicted occurrence at a single point, we used the higher 
probability of occurrence value in our composite species 
distribution. 

We produced a second composite model by partition- 
ing our presence data into the 15 recognized subspecies 
of P. polionotus and modeling the distributions of each 
subspecies separately. The subspecies P. p. trissyllepsis 
lacked sufficient data to build a functioning model, so 
that subspecies was omitted. The remaining 14 subspe- 
cies models where combined in ArcGIS, as described 
above, to produce a composite model of probability of 
occurrence for the entire species. 

The final logistic outputs of each model were used to 
assess our results. The area under the curve (AUC) of 
receiver operating characteristic (ROC) plot was used to 
evaluate model performance. The AUC is a threshold 
independent measure of model performance, where an 
AUC value of 1 indicates optimal performance, and 
AUC = 0.5 indicates a model performing no better than 
a randomly generated one. The mean and range of the 
AUC values of each group of models used in the com- 
posites were compared in an attempt to give a relative 
value of "goodness" for the two composites. 
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